Midterm Project: Strawberries and Chemicals

Mi Zhang, Peng Liu, Yuanming Leng, Qiannan Shen

Data Cleaning for Strawberries

  • remove empty/missing values and reduce white space in the cells
  • split column with multiple items to separated columns
  • redefine “MEASURED IN CWT” by multiplying by 100
  • make extreme large value more accessible by using log scale on “value”
  • redefine value as production of strawberries

Measurement units and value

  • Our major analysis is base on the measurement units, and our main focus is on “MEASURE IN LB”
GGally::ggpairs(strawb1, columns=c(3,6), aes(color=strawb1[,3], alpha = 0.5), lower=list(combo=wrap("facethist",binwidth=0.5)))

Map

  • California and Florida have higher amount of annual strawberry production in pounds than other states.

Annual value of strawberries in each state

  • Showing that California Florida increasingly used all kinds of chemicals on strawberries in recent years.
plot1("MEASURED IN LB")

Data wrangling for Strawberries and Pesticides

  • drop empty rows/columns, remove white space
  • rename colname of Pesticide to chemical in order to match the colname in strawberries data
  • use toupper() to capitalize all chemical names
  • use pivot_longer() to make all toxins and levels into one column
  • use inner_join() to wrangle Pesticide and Strawberry dataset

Questions

  • Which toxin has higher strawberry production value?
  • Which type of chemical is commonly related to toxicity?

Toxin level changes over years

  • Bee toxins are related to larger strawberries production values.

Further analysis for Florida

  • Florida shows that insecticide has higher proportion.
p4 <- plot4("MEASURED IN LB", "FLORIDA")
ggplotly(p4, tooltip="y")

Bee Toxin

  • But looking solely at bee toxins, insecticide chemicals have higher proportion in strawberry production value in California.
p5 <- plot5("MEASURED IN LB","Bee.Toxins","CALIFORNIA")
ggplotly(p5, tooltip="y")

Further Analysis for Florida

  • Confirmed that insecticide is more commonly related to toxicity
p5 <- plot5("MEASURED IN LB","Bee.Toxins","FLORIDA")
ggplotly(p5, tooltip="y")

Conclusion

  • California and Florida have higher production and more data are collected from those two states according to shiny display.
  • Bee toxins are related to higher strawberry production values than human toxins.
  • Insecticide is more commonly related to toxicity than fungicide.

Thanks

  • Professor Haviland
  • TA Bruce
  • Our lovely MA-615 Classmates
  • Our teammates

Citations